NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Partitioning Communication Streams Into Graph Snapshots

https://doi.org/10.1109/TNSE.2022.3223614

Wendt, Jeremy D.; Field, Richard V.; Phillips, Cynthia A.; Prasadan, Arvind; Wilson, Tegan; Soundarajan, Sucheta; Bhowmick, Sanjukta (December 2022, IEEE Transactions on Network Science and Engineering)

We present EASEE (Edge Advertisements into Snapshots using Evolving Expectations) for partitioning streaming communication data into static graph snapshots. Given streaming communication events (A talks to B), EASEE identifies when events suffice for a static graph (a snapshot ). EASEE uses combinatorial statistical models to adaptively find when a snapshot is stable, while watching for significant data shifts – indicating a new snapshot should begin. If snapshots are not found carefully, they poorly represent the underlying data – and downstream graph analytics fail: We show a community detection example. We demonstrate EASEE's strengths against several real-world datasets, and its accuracy against known-answer synthetic datasets. Synthetic datasets' results show that (1) EASEE finds known-answer data shifts very quickly; and (2) ignoring these shifts drastically affects analytics on resulting snapshots. We show that previous work misses these shifts. Further, we evaluate EASEE against seven real-world datasets (330 K to 2.5B events), and find snapshot-over-time behaviors missed by previous works. Finally, we show that the resulting snapshots' measured properties (e.g., graph density) are altered by how snapshots are identified from the communication event stream. In particular, EASEE's snapshots do not generally “densify” over time, contradicting previous influential results that used simpler partitioning methods.
more » « less
Full Text Available
Automatic HBM Management: Models and Algorithms

https://doi.org/10.1145/3490148.3538570

DeLayo, Daniel; Zhang, Kenny; Agrawal, Kunal; Bender, Michael A.; Berry, Jonathan W.; Das, Rathish; Moseley, Benjamin; Phillips, Cynthia A. (July 2022, Proc. 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

Full Text Available
Timely Reporting of Heavy Hitters Using External Memory

https://doi.org/https://doi.org/10.1145/3472392

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martin; Johnson, Rob; Phillips, Cynthia A (December 2021, ACM transactions on database systems)
Christopher Jermaine (Ed.)
Given an input stream S of size N, a ɸ-heavy hitter is an item that occurs at least ɸN times in S. The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection (TED) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.
more » « less
Full Text Available
Timely Reporting of Heavy Hitters Using External Memory

https://doi.org/10.1145/3472392

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martín; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A. (December 2021, ACM Transactions on Database Systems)

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.
more » « less
Full Text Available
Using advanced data structures to enable responsive security monitoring

https://doi.org/10.1007/s10586-021-03463-5

Vorobyeva, Janet; Delayo, Daniel R.; Bender, Michael A.; Farach-Colton, Martín; Pandey, Prashant; Phillips, Cynthia A.; Singh, Shikha; Thomas, Eric D.; Kroeger, Thomas M. (January 2022, Cluster Computing)

Full Text Available
How to Manage High-Bandwidth Memory Automatically

https://doi.org/10.1145/3350755.3400233

Das, Rathish; Agrawal, Kunal; Bender, Michael A.; Berry, Jonathan; Moseley, Benjamin; Phillips, Cynthia A. (July 2020, Symposium on Parallelism in Algorithms and Architectures)

Full Text Available
Timely Reporting of Heavy Hitters using External Memory

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Mart\'\i; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A. (January 2021, ACM transactions on database systems)
null (Ed.)
Full Text Available
Write-Optimized Skip Lists

https://doi.org/10.1145/3034786.3056117

Bender, Michael; Farach-Colton, Martin; Johnson, Rob; Mauras, Simon; Mayer, Tyler; Phillips, Cynthia; Xu, Helen (January 2017, PODS '17 Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems)

The skip list is an elegant dictionary data structure that is com- monly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(logN) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p. A seemingly natural way to generalize the skip list to external memory with block size B is to “promote” with probability 1/B, rather than 1/2. However, there are practical and theoretical obsta- cles to getting the skip list to retain its efficient performance, space bounds, and high-probability guarantees. We give an external-memory skip list that achieves write- optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N)/B1−ε) amortized I/Os w.h.p. Our write-optimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior write-optimized data structures such as Bε trees or LSM trees. These data structures are deployed in high-performance databases and file systems.
more » « less
Full Text Available

Search for: All records